236 research outputs found
Decentralized Collaborative Learning Framework for Next POI Recommendation
Next Point-of-Interest (POI) recommendation has become an indispensable
functionality in Location-based Social Networks (LBSNs) due to its
effectiveness in helping people decide the next POI to visit. However, accurate
recommendation requires a vast amount of historical check-in data, thus
threatening user privacy as the location-sensitive data needs to be handled by
cloud servers. Although there have been several on-device frameworks for
privacy-preserving POI recommendations, they are still resource-intensive when
it comes to storage and computation, and show limited robustness to the high
sparsity of user-POI interactions. On this basis, we propose a novel
decentralized collaborative learning framework for POI recommendation (DCLR),
which allows users to train their personalized models locally in a
collaborative manner. DCLR significantly reduces the local models' dependence
on the cloud for training, and can be used to expand arbitrary centralized
recommendation models. To counteract the sparsity of on-device user data when
learning each local model, we design two self-supervision signals to pretrain
the POI representations on the server with geographical and categorical
correlations of POIs. To facilitate collaborative learning, we innovatively
propose to incorporate knowledge from either geographically or semantically
similar users into each local model with attentive aggregation and mutual
information maximization. The collaborative learning process makes use of
communications between devices while requiring only minor engagement from the
central server for identifying user groups, and is compatible with common
privacy preservation mechanisms like differential privacy. We evaluate DCLR
with two real-world datasets, where the results show that DCLR outperforms
state-of-the-art on-device frameworks and yields competitive results compared
with centralized counterparts.Comment: 21 Pages, 3 figures, 4 table
Manipulating Federated Recommender Systems: Poisoning with Synthetic Users and Its Countermeasures
Federated Recommender Systems (FedRecs) are considered privacy-preserving
techniques to collaboratively learn a recommendation model without sharing user
data. Since all participants can directly influence the systems by uploading
gradients, FedRecs are vulnerable to poisoning attacks of malicious clients.
However, most existing poisoning attacks on FedRecs are either based on some
prior knowledge or with less effectiveness. To reveal the real vulnerability of
FedRecs, in this paper, we present a new poisoning attack method to manipulate
target items' ranks and exposure rates effectively in the top-
recommendation without relying on any prior knowledge. Specifically, our attack
manipulates target items' exposure rate by a group of synthetic malicious users
who upload poisoned gradients considering target items' alternative products.
We conduct extensive experiments with two widely used FedRecs (Fed-NCF and
Fed-LightGCN) on two real-world recommendation datasets. The experimental
results show that our attack can significantly improve the exposure rate of
unpopular target items with extremely fewer malicious users and fewer global
epochs than state-of-the-art attacks. In addition to disclosing the security
hole, we design a novel countermeasure for poisoning attacks on FedRecs.
Specifically, we propose a hierarchical gradient clipping with sparsified
updating to defend against existing poisoning attacks. The empirical results
demonstrate that the proposed defending mechanism improves the robustness of
FedRecs.Comment: This paper has been accepted by SIGIR202
Learning Compact Compositional Embeddings via Regularized Pruning for Recommendation
Latent factor models are the dominant backbones of contemporary recommender
systems (RSs) given their performance advantages, where a unique vector
embedding with a fixed dimensionality (e.g., 128) is required to represent each
entity (commonly a user/item). Due to the large number of users and items on
e-commerce sites, the embedding table is arguably the least memory-efficient
component of RSs. For any lightweight recommender that aims to efficiently
scale with the growing size of users/items or to remain applicable in
resource-constrained settings, existing solutions either reduce the number of
embeddings needed via hashing, or sparsify the full embedding table to switch
off selected embedding dimensions. However, as hash collision arises or
embeddings become overly sparse, especially when adapting to a tighter memory
budget, those lightweight recommenders inevitably have to compromise their
accuracy. To this end, we propose a novel compact embedding framework for RSs,
namely Compositional Embedding with Regularized Pruning (CERP). Specifically,
CERP represents each entity by combining a pair of embeddings from two
independent, substantially smaller meta-embedding tables, which are then
jointly pruned via a learnable element-wise threshold. In addition, we
innovatively design a regularized pruning mechanism in CERP, such that the two
sparsified meta-embedding tables are encouraged to encode information that is
mutually complementary. Given the compatibility with agnostic latent factor
models, we pair CERP with two popular recommendation models for extensive
experiments, where results on two real-world datasets under different memory
budgets demonstrate its superiority against state-of-the-art baselines. The
codebase of CERP is available in https://github.com/xurong-liang/CERP.Comment: Accepted by ICDM'2
An Evaluation of Model-Based Approaches to Sensor Data Compression
As the volumes of sensor data being accumulated are likely to soar, data compression has become essential in a wide range of sensor-data applications. This has led to a plethora of data compression techniques for sensor data, in particular model-based approaches have been spotlighted due to their significant compression performance. These methods, however, have never been compared and analyzed under the same setting, rendering a ârightâ choice of compression technique for a particular application very difficult. Addressing this problem, this paper presents a benchmark that offers a comprehensive empirical study on the performance comparison of the model-based compression techniques. Specifically, we re-implemented several state-of-the-art methods in a comparablemanner, andmeasured various performance factors with our benchmark, including compression ratio, computation time, model maintenance cost, approximation quality, and robustness to noisy data. We then provide in-depth analysis of the benchmark results, obtained by using 11 different real datasets consisting of 346 heterogeneous sensor data signals. We believe that the findings from the benchmark will be able to serve as a practical guideline for applications that need to compress sensor data
Result Selection and Summarization for Web Table Search
The amount of information available on the Web has been growing dramatically, raising the importance of techniques for searching the Web. Recently, Web Tables emerged as a model, which enables users to search for information in a structured way. However, effective presentation of results for Web Table search requires (1) selecting a ranking of tables that acknowledges the diversity within the search result; and (2) summarizing the information content of the selected tables concisely but meaningful. In this paper, we formalize these requirements as the \emph{diversified table selection} problem and the \emph{structured table summarization} problem. We show that both problems are computationally intractable and, thus, present heuristic algorithms to solve them. For these algorithms, we prove salient performance guarantees, such as near-optimality, stability, and fairness. Our experiments with real-world collections of thousands of Web Tables highlight the scalability of our techniques. We achieve improvements up to 50\% in diversity and 10\% in relevance over baselines for Web Table selection, and reduce the information loss induced by table summarization by up to 50\%. In a user study, we observed that our techniques are preferred over alternative solutions
Robust and Hierarchical Stop Discovery in Sparse and Diverse Trajectories
The advance of GPS tracking technique brings a large amount of trajectory data. To better understand such mobility data, semantic models like âstop/moveâ (or inferring âactivityâ, âtransportation modeâ) recently become a hot topic for trajectory data analysis. Stops are important parts of tra- jectories, such as âworking at officeâ, âshopping in a mallâ, âwaiting for the busâ. There are several methods such as velocity, clustering, density algorithms being designed to discover stops. However, existing works focus on well-defined trajectories like movement of vehicle and taxi, not working well for heterogeneous cases like diverse and sparse trajectories. On the contrary, our paper addresses three main challenges: (1) provide a robust clustering-based method to discover stops; (2) discover both shared stops and personalized stops, where shared stops are the common places where many trajectories pass and stay for a while (e.g. shopping mall), whilst personalized stops are individual places where user stays for his/her own purpose (e.g. home, office); (3) further build stop hierarchy (e.g. a big stop like EPFL campus and a small stop like an office building). We evaluate our approach with several diverse and spare real-life GPS data, compare it with other methods, and show its better data abstraction on trajectory
Privacy-Preserving Schema Reuse
As the number of schema repositories grows rapidly and several web-based platforms exist to support publishing schemas, \emph{schema reuse} becomes a new trend. Schema reuse is a methodology that allows users to create new schemas by copying and adapting existing ones. This methodology supports to reduce not only the effort of designing new schemas but also the heterogeneity between them. One of the biggest barriers of schema reuse is about privacy concerns that discourage schema owners from contributing their schemas. Addressing this problem, we develop a framework that enables privacy-preserving schema reuse. Our framework supports the contributors to define their own protection policies in the form of \emph{privacy constraints}. Instead of showing original schemas, the framework returns an \emph{anonymized schema} with maximal \emph{utility} while satisfying these privacy constraints. To validate our approach, we empirically show the efficiency of different heuristics, the correctness of the proposed utility function, the computation time, as well as the trade-off between utility and privacy
DREAM: Adaptive Reinforcement Learning based on Attention Mechanism for Temporal Knowledge Graph Reasoning
Temporal knowledge graphs (TKGs) model the temporal evolution of events and
have recently attracted increasing attention. Since TKGs are intrinsically
incomplete, it is necessary to reason out missing elements. Although existing
TKG reasoning methods have the ability to predict missing future events, they
fail to generate explicit reasoning paths and lack explainability. As
reinforcement learning (RL) for multi-hop reasoning on traditional knowledge
graphs starts showing superior explainability and performance in recent
advances, it has opened up opportunities for exploring RL techniques on TKG
reasoning. However, the performance of RL-based TKG reasoning methods is
limited due to: (1) lack of ability to capture temporal evolution and semantic
dependence jointly; (2) excessive reliance on manually designed rewards. To
overcome these challenges, we propose an adaptive reinforcement learning model
based on attention mechanism (DREAM) to predict missing elements in the future.
Specifically, the model contains two components: (1) a multi-faceted attention
representation learning method that captures semantic dependence and temporal
evolution jointly; (2) an adaptive RL framework that conducts multi-hop
reasoning by adaptively learning the reward functions. Experimental results
demonstrate DREAM outperforms state-of-the-art models on public datasetComment: 11 page
- âŠ